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Abstract 

Background: Multiple myeloma (MM) is a malignant proliferation of plasma B cells. Based on recurrent aneuploidy 
such as copy number alterations (CNAs), myeloma is divided into two subtypes with different CNA patterns and 
patient survival outcomes. How aneuploidy events arise, and whether they contribute to cancer cell evolution are 
actively studied. The large amount of transcriptomic changes resultant of CNAs (dosage effect) pose big challenges 
for identifying functional consequences of CNAs in myeloma in terms of specific driver genes and pathways. In this 
study, we hypothesize that gene-wise dosage effect varies as a result from complex regulatory networks that 
translate the impact of CNAs to gene expression, and studying this variation can provide insights into functional 
effects of CNAs. 

Results: We propose gene-wise dosage effect score and genome-wide l<aiyotype plot as tools to measure and 
visualize concordant copy number and expression changes across cancer samples. We find that dosage effect in 
myeloma is widespread yet variable, and it is correlated with gene expression level and CNA frequencies in 
different chromosomes. Our analysis suggests that despite the enrichment of differentially expressed genes 
between hyperdiploid MM and non-hyperdiploid MM in the trisomy chromosomes, the chromosomal proportion of 
dosage sensitive genes is higher in the non-trisomy chromosomes. Dosage-sensitive genes are enriched by genes 
with protein translation and localization functions, and dosage resistant genes are enriched by apoptosis genes. 
These results point to future studies on differential dosage sensitivity and resistance of pro- and anti-proliferation 
pathways and their variation across patients as therapeutic targets and prognosis markers. 

Conclusions: Our findings support the hypothesis that recurrent CNAs in myeloma are selected by their functional 
consequences. The novel dosage effect score defined in this work will facilitate integration of copy number and 
expression data for identifying driver genes in cancer genomics studies. The accompanying R code is available at 
http://www.canevolve.org/dosageEffect/. 
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Background 

Multiple myeloma (MM or myeloma) is a malignant pro- 
liferation of plasma B cells that contain prevalent genomic 
alterations [1,2]. Based on copy number alterations and 
translocations, myeloma is divided into two main sub- 
types. The hyperdiploid multiple myeloma (HMM) is 
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characterized by the trisomy (three copies) of eight 
chromosomes, 3, 5, 7, 9, 11, 15, 19 and 21, and the non- 
hyperdiploid multiple myeloma (NHMM) is often associ- 
ated with chromosome 13 hemizygous deletion and 
translocations between the immunoglobulin heavy chain 
gene and proto-oncogenes [3]. The copy number alter- 
ations (CNAs) in other chromosome regions such as Iq, 
6q, 8p, and 16q occur in both subtypes (Figure 1). 

Excessively present in myeloma, CNAs are also fre- 
quent genomic alteration events in most hematological 
cancers and many solid cancers [4]. How CNAs arise and 
whether they contribute to cancer cell evolution is actively 
debated [5,6]. Some argue that CNAs are a benign side 
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Figure 1 SNP copy number-based clustering provides an overall view of myeloma genomic heterogeneity and subtypes. Myeloma 
samples in the IFM dataset are clustered based on their SNP-array copy numbers from left to right. SNPs are ordered by chromosome positions 
from top to bottom. Blue color indicates copy number loss, white color indicates copy number close to normal, and red color indicates copy 
number gains. The horizontal bars on the top indicate samples clusters with copy number features of NHMM (blue) and HMM (red). 



effect of genomic instability during cancer development, 
which results in many types of mutations and chromo- 
some mis-segregation [7]. Others propose that CNAs have 
functional consequences on survival and proliferation of 
cancer cells [8,9], and therefore could be subjected to se- 
lection during cancer cell evolution. The eight non- 
random trisomy chromosomes that characterize HMMs, 
and better patient survival rates in HMM than in NHMM 



support the functional selection of CNAs in myeloma 
[10]. More substantial analysis is needed to identify candi- 
date driver genes or pathways affected by CNAs and de- 
sign experiments that interfere with these candidates to 
study their functional effects on cancer cells. However, 
thousands of genes located in the CNA regions in mye- 
loma have altered mRNA expression levels directly in- 
duced by copy number changes, a phenomenon termed 
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copy number dosage effect [11]. In addition, among the 
genes with dosage effect, transcription factors and epigen- 
etic modifiers could further modulate mRNA expression 
of their many downstream genes [12]. These large number 
of directly and indirectly affected genes by CNAs pose big 
challenges for identifying functional consequences of 
CNAs in myeloma in terms of specific driver genes and 
pathways, and for studying the interplay between these 
driver genes and other genetic mutations during myeloma 
cell evolution. 

Technological development in genomics has promoted 
multi-scale and integrative studies of cancer [13]. In par- 
ticular, paired gene expression and copy number profiling 
for the same cancer samples has helped identifying driver 
cancer genes in CNA regions, expression signatures that 
correlate with particular CNAs, and new cancer subtypes 
[4,14-16]. Copy number dosage effect has been a direct 
or implied question in such studies, which have shown 
the dosage effect of genes in various cancer types such as 
leukemia, lymphoma, liver cancer and breast cancer 
[17-20]. In myeloma, Dickens and colleagues have identi- 
fied 170 genes with homozygous deletions and corre- 
sponding loss of expression using paired gene expression 
and copy number alteration profiles [21]. Cell death net- 
work is over-represented in these genes, and patients 
with these deletions have worse overall survival. Agnelli 
and colleagues have compared myeloma samples with 
and without chromosome 13 deletions and identified 67 
down-regulated genes of which 44 maps to chromosome 
13 [22]. These genes are involved in protein biosyn- 
thesis, ubiquitination and transcriptional regulation. Many 
genes and pathways are differentially expressed between 
the HMM and NHMM subtypes, which have distinct 
CNA profiles [23,24]. These results support that CNAs 
have functional consequences through affecting gene ex- 
pression and may contribute to myeloma cell evolution 
through cancer-related pathways. 

When the extent of CNAs spans multiple chromo- 
somes and hundreds or thousands of genes, such as in 
myeloma, current dosage effect studies integrating gene 
and copy number data have limitations. First, they tend 
to identify pathways with broad functions such as prolifer- 
ation and apoptosis, whose gene members are enriched in 
the genes with dosage effect. Although these pathways 
provide insights into possible contribution of CNAs to 
cancer, they do not suggest specific therapeutic targets to 
improving current treatment regimes. Second, regulatory 
feed-back loops and dosage compensation mechanisms 
may attenuate the dosage effect of many genes [25], caus- 
ing them missed by correlation analysis between copy 
number and expression or differential expression analysis 
between CNA groups. 

Based on these observations, we hypothesize that 
gene-wise dosage effect varies as a result from complex 



regulatory networks that translate the impact of CNAs to 
gene expression, and studying this variation can provide 
insights into functional effects of CNAs. We propose a 
gene-wise dosage effect score to measure the prevalence 
of concordant copy number and expression changes 
across the samples in a dataset. Applied to two myeloma 
datasets, a genome-wide karyotype plot of the dosage ef- 
fect reveals the variation of dosage effect between genes 
and across chromosomes. We then study the potential 
causes of these variations in terms of gene s overall expres- 
sion level and chromosomes' CNA frequency. Unexpect- 
edly, four non-trisomy chromosomes (13, 14, 16, and 22) 
emerge as having the highest proportion of their genes 
that are dosage sensitive. Dosage sensitive genes and dos- 
age resistant genes are enriched with different functional 
pathways. In particular, the apoptosis pathway is enriched 
in dosage resistant genes, raising interesting hypothesis 
about the mechanism of this phenomenon and its utility 
as a prognosis marker. Taken together, our findings sup- 
port that recurrent CNAs in myeloma are selected by their 
functional consequences. The proposed dosage effect ana- 
lysis method will facilitate many genomics studies inte- 
grating gene copy number and expression data. 

Methods 

Microarray datasets 

We used two datasets providing paired genome-wide 
profiles of copy number alterations and gene expres- 
sion for newly diagnosed myeloma patients. The first is 
the IFM (Intergroupe Francophone du Myelome) dataset 
with 170 patients profiled in our laboratory by high-density 
Affymetrix 500 K SNP arrays and Exon ST 1.0 expres- 
sion arrays (GEO identifiers GSE12896 and GSE39754). 
Patient characteristics and clinical information have been 
reported previously [2]. The second is the Dickens dataset 
with 71 patients profiled by Aff)^metrix 500 K SNP ar- 
rays and Human Genome U133 Plus 2.0 expression arrays 
(GSE15695) [21]. 

Primary analysis of gene expression and copy 
number data 

The gene expression and copy number profiling data 
were preprocessed with the dChip software [26,27] to 
obtain expression and copy number estimates. The gene 
expression profiles were quantile-normalized and model- 
based signals were computed [26,27]. Gene probe set in- 
tensity values were averaged when more than one probe 
sets per gene were available. To assess the impact of al- 
ternative methods for handling Aff)^metrix array probe 
sets on Dosage effect scores, we also obtained alternative 
expression estimates for the IFM dataset from a CDF 
(HuEx-l_0-st-v2,coreR3,A20071112,EP.CDF) provided by 
Purdom et al [28]. This CDF, available form aroma, 
affymetrix website, groups Affymetrix "core" probesets into 
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18,708 transcript clusters. In this setting, each exon and 
probe is uniquely mapped to only one transcript cluster, 
allowing for a unique gene expression value for each indi- 
vidual gene after pre-processing and eliminating the need 
to use avarage value for each gene. 

A genes copy number was obtained by averaging the 
copy numbers of the SNPs within 5 kb of the genes 
transcription region. Further analysis was done with the 
R programming environment. We created gene expres- 
sion and copy number data matrices based on unique 
gene symbols, and then identified the gene symbols 
common between the two data matrices for dosage ef- 
fect analysis: 12836 genes for the IFM dataset and 12359 
genes for the Dickens dataset. 

Hyperdiploidy status 

Since there were no published hyperdiploidy data deter- 
mined by FISH for both datasets, the myeloma hyper- 
diploidy status was defined using the copy number 
information from SNP arrays. To determine the threshold 
for separating samples in to two subtypes, we plotted the 
distribution of the median copy number of the trisomy 
chromosomes. We observed a bimodal distribution with 
two peaks at 2 and 2.6. The HMM samples have a peak 
near 2.6 (and not 3) due to the normalization effect across 
arrays. We chose the midway value (2.3) between the two 
peaks as the cutoff to call HMM and NHMM samples 
from the copy number data. 77 of 170 IFM samples, and 
42 of 71 Dickens samples were called as HMM, and the 
rest samples were called as NHMM. 

Defining dosage effect score (DES) 

We defined a gene-wise dosage effect score (DES) to iden- 
tify genes whose copy number changes modulate their 
own expression. The DES is the ratio between "dosage ef- 
fect sample number" and "CNA sample number". The 
"CNA sample number" of a gene, is the number of samples 
in a dataset that have CNA at the gene (copy number < 1.7 
or > 2.3, considering signal noise and normal cell contam- 
ination). The "dosage effect sample number" of the gene is 
defined using two steps. First, using the samples with no 
copy number changes at the gene (copy number within 1.7 
and 2.3), the mean (normal_mean) and standard deviation 
(normal_sd) of the gene expression values is calculated. 
Next, we count the total number of samples that have 
CNA at this gene and their expression levels at the gene 
change more than random in the same direction as their 
CNA (i.e. the samples whose gene expression value are 
higher than normal_mean + normal_sd for amplification 
samples or lower than normal_mean - normal_sd for dele- 
tion samples). The number of these samples is the "dosage 
effect sample number" at the gene. 

Being a ratio, DES ranges between 0 and 1. Larger 
DES indicates stronger dosage effect at the gene in the 



sense that a larger proportion of CNA samples have 
concordant expression changes. To ensure reliability, 
DES is computed only for genes whose CNA sample 
number is greater than 10% of all samples. As an ex- 
ample, the gene EIF3K has CNA in 55.88% of samples in 
the IFM data set. Utilizing mean +/- 1 SD criteria, the 
percent of samples with both copy number alteration and 
concordant expression changes are 25.29%. EIF3K has 
the DES of 0.45 (.2559/.5588) in the IFM dataset. 

Differential and functional enrichment analysis 

Differential expression was identified with the LIMMA 
Bioconductor package, and Benjamini-Hochberg mul- 
tiple hypothesis correction was carried out using the R 
package multtest Differential gene expression was called 
at the adjusted p-value < 0.05. We used the DAVID web 
server for gene function enrichment analysis using gene 
ontology (GO) categories (GOTERM_BP_FAT, GOTERM_ 
CC_FATand GOTERM_MF_FAT) [29]. 

Results 

Differentially expressed genes between HMM and NHMM 
are enriched in the trisomy chromosomes 

A primary difference between the HMM and NHMM sub- 
types is the presence of three copies (trisomy) of eight spe- 
cific chromosomes in HMM and normal copy numbers for 
them in NHMM (Figure 1). To check whether the trisomy 
chromosomes influence the expression of the genes lo- 
cated within them, we compared the HMM and NHMM 
samples of the IFM dataset for differentially expressed 
genes. The trisomy chromosomes contained more HMM 
up-regulated genes than HMM down-regulated genes 
(Figure 2). Specifically, 744 of the 4702 (15.8%) tri- 
somy chromosome genes are up-regulated in HMM, 
while only 416 of the 8134 (5.1%) non- trisomy chromo- 
some genes are up-regulated in HMM (Fishers exact test 
p-value < 2.2e-16). 

In contrast, genes down- regulated in HMM are signifi- 
cantly depleted from the trisomy chromosomes (140 of 
1798 down-regulated genes locate in the trisomy chro- 
mosomes, p-value < 2.2e-16). Such enrichment is not 
due to that expressed genes in myeloma preferentially 
locate in the trisomy chromosomes. The reason is as fol- 
lows: when the expressed genes in myeloma is defined 
as the top 50% genes ranked by median expression level 
across the IFM samples, 51.4% (2418 genes) of the tri- 
somy chromosome genes are expressed and 49.2% (3999 
genes) of the non-trisomy chromosomes are expressed 
(p-value = 0.014). 

Chromosome Iq amplification is observed in -40% of 
myeloma patients (e.g. Figure 1) and many genes on 
lq21 have been reported as over-expressed when com- 
paring myeloma against normal. In the IFM data, we ob- 
serve much more down-regulated genes than up-regulated 
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genes from chromosome 1 when comparing HMM vs. 
NHMM samples (Figure 2). This can be due to more fre- 
quent amplification of chromosome Iq in NHMM than in 
HMM (Figure 1). These results suggest that chromosome 
copy number changes in myeloma could have direct effect 
on the expression level of a substantial number of genes. 

Widespread but variable dosage effect in myeloma 

To understand the global and spatial patterns of how 
copy number alterations (CNAs) affect gene expression 
in myeloma, we integrated paired gene expression and 
copy number profiling data from two myeloma datasets. 
We defined a dosage effect score (DES) for each gene as 
the proportion of CNA samples that have concordant 
copy number and gene expression changes (see Methods), 
and then visualized the gene-wise CNA frequencies 
and DES in a genome-wide karyotype plot (Figures 3A 
and 4A). The plots show that the dosage effect in mye- 
loma patients is widespread and there is an overall similar 
dosage effect patterns between the IFM and Dickens 
datasets. The impact of dosage on gene expression can be 
observed at most chromosomes, such as the trisomy chro- 
mosomes 9, 11 and 15, and chromosomes 13, 14 and 16, 
which contain hemizygously deleted regions. 

DES ranges between 0 and 1, and summarizes con- 
cordant copy number and gene expression changes. A 
value close to 0 and 1 indicates weak and strong dosage 
effect, respectively. The distribution of DES in the IFM 
data shows a large variation of DES across genes: 58% 
genes have low dosage effect (DES < 0.2, Figure 3B), 37% 
have medium dosage effect (DES between 0.2 and 0.4), and 



only 5.2% (611 genes) have high dosage effect (DES > 0.4). 
For the Dickens data the distribution is similar: 64%, 
30%, and 5.5% (655) genes have low, medium and high 
dosage effect, respectively (Figure 4B). The high dos- 
age effect genes from the two datasets have 123 genes 
in common (Fishers exact p-value < 2.2e-16). The top 
100 dosage-sensitive as well as resistant-genes for IFM 
and Dickens datasets are reported in Additional files 1 
and 2. Deletion in chromosome 13 is a prognostic fac- 
tor in MM [3]. Therefore genes showing strong dos- 
age effect on this chromosome may be important for 
MM. A list of dosage sensitive genes on this chromo- 
some is available in Additional file 3. ELOVL7, PTDSSl, 
POLRID, TBC1D22A, CCNC and TCF25 are some of the 
interesting genes with cancer related functions identified 
from the DES analysis. Additional file 4 summarizes cancer 
related functions based on PubMed for top 50 dosage sen- 
sitive genes in both IFM and Dickens datasets. 

Higher gene expression level is associated with stronger 
dosage effect 

To explain the variation of dosage effect for the genes 
with similar CNA frequencies, such as those in the same 
trisomy chromosome (Figure 3A), we correlated gene 
expression levels with the DES. The motivation is that 
some genes are expressed at higher level than others, 
and genes that are not expressed in myeloma will not 
exhibit any dosage effect. We first ordered all the genes 
by their median expression levels across samples and 
divided the genes into five equal-size groups (Figure 5, 
X-axis), and then compared the proportion of genes 
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having low, medium and high dosage effect across the 
groups (Figure 5, Y-axis). We found that for both datasets, 
higher expressed gene groups contain significant larger 
proportion of genes with high dosage effect (Cochran- 
Armitage p-value < 2.2e-16 for both datasets). 

Higher chromosomal CNA frequency is correlated with 
higher proportion of dosage sensitive genes 

The proportion of dosage sensitive genes (DES > 0.4) in 
each chromosome also varies across chromosomes, ran- 
ging from about 2% for chromosome 17 to >12% for 
chromosome 13 (Figure 6A). These proportions strongly 
correlate between the two datasets. Interestingly, the 
chromosomal proportion of dosage sensitive genes is 
highest for four non-trisomy chromosomes (chromo- 
somes 13, 14, 16 and 22) and lower for the trisomy chro- 
mosomes (marked red in Figure 6A). In contrast, the 
proportion of genes with medium dosage effect in each 



chromosome is not correlated between the two datasets 
(Figure 6B). 

Because the pattern of recurrent chromosomal gains and 
losses in myeloma are conserved across different datasets 
(Figures 1 and 6C), we asked whether the variation in the 
proportion of dosage sensitive genes across chromosomes 
is correlated with the myeloma CNA patterns. We com- 
pared the proportion of dosage sensitive genes with the 
proportion of genes with CNA across chromosomes. For 
non-trisomy chromosomes (marked blue in Figure 6D, 6E), 
we observed that larger CNA proportions associate with 
higher proportion of dosage sensitive genes. 

Both dosage sensitive genes and differentially expressed 
genes between HMM and NHMM are enriched by protein 
translation and localization processes 

To explore the functional consequences of dosage effect, 
we performed functional enrichment analysis for dosage 
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sensitive genes (DES > 0.4, genes listed in Additional file 1 
and 2 for the IFM and Dickens dataset respectively) using 
the DAVID gene ontology Web server. Translation, protein 
localization, and proteolysis processes are enriched in dos- 
age sensitive genes from the IFM dataset (Additional file 5: 
Table SI). Translation, protein localization and mitochon- 
drial processes are enriched in dosage sensitive genes from 
the Dickens dataset (Additional file 5: Table S2). 

Since the distinct CNA patterns in the HMM and 
NHMM subtypes can induce subtype specific expression 
changes through dosage effect, we asked how dosage ef- 
fect analysis compares to differential expression analysis. 
We compared 2958 differentially expressed genes be- 
tween HMM and NHMM (Figure 2) and 611 dosage 
sensitive genes (DES > 0.4) from the IFM dataset. Their 
overlap of 306 genes is highly significant (Fisher s exact 
test p-value < 2.2e-16). Gene ontology groups enriched 
in HMM up-regulated genes include translation and 



RNA-binding, and those enriched in NHMM up-regulated 
genes include protein localization, proteolysis, cell cycle, 
RNA processing and histone modification (Additional 
file 5: Table S3). These results suggest that differentially 
expressed genes between the two myeloma subtypes are 
mainly driven by dosage sensitive genes located in distinct 
CNA regions of the two subtypes. 

Dosage resistant genes are enriched by the apoptosis 
process 

Compared to dosage sensitive genes, dosage resistant 
genes (Additional file 1 and Additional file 2) may be 
subjected to more stringent dosage compensatory net- 
works such as feedback loops. Dosage resistant genes 
were defined as the top 20% of highly expressed genes 
(Group 5 from Figure 5) that have DES < 0.2 (genes 
listed in Additional files 1 and 2). Gene ontology enrich- 
ment analysis of dosage resistant genes identified processes 
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such as translation and organelle lumen, which are com- 
monly enriched in both dosage sensitive and resistant 
genes. However, apoptosis emerged as a distinct process 
only enriched in dosage resistant genes in both datasets 
(Additional file 5: Table S4, Table S5). 

Alternative pre-processing of Affymetrix Exon arrays does 
not impact dosage effect score substantially 

It is valuable to assess impact of alternative methods for 
handling Affymetrix Exon array probe sets on DES. 
Therefore, we obtained alternative expression estimates 
for the IFM dataset from a CDF provided by Purdom 
et al [28]. For the IFM dataset, the Pearson correlation 
coefficient for the mean of gene expression estimates 
across the samples of dChip and aroma.affymetrix normal- 
ized data is 0.86 (12192 genes; p < 2.2e-16) and the correl- 
ation for DES scores is 0.81 (10960 genes; p < 2.2e-16). 



Moreover, the results from functional enrichment ana- 
lysis of DES scores from the IFM dataset derived using 
aroma. affymetrix expression estimates overlap signifi- 
cantly with those obtained from dChip expression esti- 
mates (Additional file 5: Figure SIA and SIB, Additional 
file 5: Table S6 and S7). In addition, we can observe simi- 
lar DES patterns for chromosomal locations (Figure 3 and 
Additional file 5: Figure S2). Please see Additional file 5 
for detailed enrichment results. Thus, conclusions from 
our analysis are robust against different pre-processing 
methods. 

Discussion 

Cancer genomes such as those of multiple myeloma har- 
bor many types of genomic aberrations. Among them, 
CNAs of entire chromosomes or focal chromosomal re- 
gions have been extensively detected from many cancer 
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Figure 6 The correlation between dosage sensitive genes and CNA frequency across chromosomes. (A) The proportion of genes in each 
chromosome that are dosage sensitive (DES > 0.4). X-axis: the IFM dataset, Y-axis: the Dickens dataset. (B) The proportion of genes in each 
chromosome that have medium dosage effect (0.2 < = DES < = 0.4). X-axis: the IFM dataset, Y-axis: the Dickens dataset. (C) Copy number 
alteration frequency in each chromosome compared between the two datasets. Each point represents the CNA (copy number <1.7 or > 2.3) 
frequency of a gene across samples, averaged over all the genes in a chromosome. (D, E) The proportion of dosage sensitive genes (Y-axis) is 
plotted against the average CNA frequency (X-axis) for different chromosomes, for the IFM data (D) and the Dickens data (E). 





types in the past decade through microarrays and mas- 
sively parallel sequencing technologies. Paired copy num- 
ber and gene expression profiling of the same cancer 
samples has enabled integrative analysis identifying driver 
oncogenes, improving classification of cancer subtypes, 
and provided better understanding of molecular pathways 
dysregulated in cancer [4,15,19,30]. In myeloma genomes, 
prevalent and recurrent patterns of 8 trisomy chromo- 
somes and deletions of specific chromosomes (Ip, 6q, 8p, 
13, 16q; see Figure 1) provide a model system to study the 
effect of CNAs on gene expression. Although previous 
studies hinted at the copy number dosage effect in various 
cancer types including myeloma [11,18], in this study we 
have for the first time defined dosage effect score, 
highlighted its variation across genes and chromosomes, 
and studied the potential consequence of these variations 
in terms of dosage sensitive genes and dosage-resistant 
genes. Other genomic changes such as copy-neutral loss of 



heterozygosity can be inferred from SNP array data, and 
they could affect gene expression levels. Study of these al- 
terations can be followed up in the future. 

Defining a gene-wise dosage effect score (DES) allows 
us to compare and visualize the impact of CNAs on gene 
expression at the level of individual genes, across the 
genome and between datasets. The DES score was calcu- 
lated only for those genes that have copy number alter- 
ations in at least 10% of patients and vary their expression 
beyond the mean expression -i-/- one SD, as a safeguard in 
the process. Because cis -regulatory factors may reduce the 
gene expression output even if the copy number has in- 
creased, using higher SD values could result in significantly 
higher amount of false negatives. In practice, users of our 
method can use any suitable threshold of choice. The 
visualization of distribution of DES along chromosomes 
suggests that dosage effect in myeloma is widespread and 
variable across the genome (Figures 3 and 4). Most genes 
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show low or moderate dosage effect, and only about 5% of 
genes are highly dosage sensitive. We analyze two contrib- 
uting causes for such variation: highly expressed genes are 
more likely to have higher dosage effect (Figure 5), and 
higher frequency of CNAs in a chromosome correlates 
with higher proportion of dosage sensitive genes (Figure 6D 
and 6E). Interestingly, although three non-trisomy chro- 
mosomes (14, 16, 22) have lower frequency of CNAs than 
the 8 trisomy chromosomes, the proportion of dosage sen- 
sitive genes in these non-trisomy chromosomes are higher 
than in the trisomy chromosomes (Figure 6D and 6E). This 
could be due to that 3-copy gains in trisomy chromosomes 
lead to 1.5 fold expression changes, while 1-copy losses 
in non-trisomy chromosomes lead to 2 fold expression 
changes. The larger relative impact on gene expression 
of copy number losses may lead to higher proportion of 
dosage sensitive genes in non-trisomy chromosomes. 

Dosage sensitive genes are enriched by genes with 
function in protein translation and localization. Recently, 
dosage effects of protein translation and transport genes 
are reported for plasma cell leukemia, a cancer closely 
related to myeloma [31]. Overdrive of translation machines 
is a feature of cancer cell proliferation [32-34]. Dosage sen- 
sitivity of the genes in translation processes indicates their 
expression levels are not subject to tightly controlled dos- 
age compensation and are susceptible to the influence of 
copy number alteration. These results support that CNA 
patterns in myeloma, such as the eight characteristic tri- 
somy chromosomes in about half of all myeloma patients, 
are not a by-product of myeloma genomic evolution but 
have functional and consequential effects that contribute 
to the pathogenesis of myeloma. 

We find that gene ontology groups enriched in dosage 
sensitive genes are similar to those enriched in the dif- 
ferential expressed genes between HMM and NHMM 
subtypes (Additional file 5: Tables SI, S2 and S3), and 
these two gene sets overlap significantly. Previous gene 
expression studies comparing HMM and NHMM have 
found similar enriched pathways. Chng et al reported 
over-expression of ribosomal protein and protein biosyn- 
thesis genes in HMM relative to NHMM [24]. Agnelli 
et, al reported that up-regulated genes in HMM com- 
pared to NHMM are mainly involved in protein biosyn- 
thesis, encoding for ribosomal proteins and mitochondrial 
ribonucleoproteins [23], and these genes mainly mapped to 
the hyperdiploid chromosomes. Therefore, distinct CNA 
patterns in HMM and NHMM are likely to be the cause 
for most differentially expressed genes between the two 
subtypes through dosage effect. 

We also identify specific dosage sensitive genes that 
may play critical roles in myeloma development. For ex- 
ample (Additional files 1, 2 and 3), Tamura et al, [35] 
showed that EVOLV7 could be involved in prostate cancer 
growth and survival. Camps et al, [36] identified that loss 



of POLRID function affects cell viability in colorectal can- 
cers. Cell cycle role of CCNC and its effect on several can- 
cer types were reported by several authors [37,38]. Steen 
et al, previously reported the role of NULPl (TCG25) in 
cell death control and tumor growth [39]. Additional file 4 
includes literature search for the top 50 dosage sensitive 
genes for both IFM and Dickens dataset. 

A less studied phenomenon in integrative genomics is 
dosage resistance or compensation. Essential genes whose 
expression level must be tightly regulated may achieve 
dosage resistance through feedback loops that compensate 
for copy number changes [40] . A recent study reports that 
amplified genes may be over-expressed, unchanged or 
down-regulated in cervical cancer [41]. Signaling and 
receptor activity functions are enriched in genes whose ex- 
pression negatively correlates with copy number in glio- 
blastoma [42]. In our analysis, we found apoptosis as an 
enriched category in dosage resistant genes but not in dos- 
age sensitive genes. The gene ontology category "apoptosis" 
encompasses a large number of genes with both "signaling" 
and other functions. This ontology as any categorization is 
subjective and incomplete with gaps in knowledge. The 
apoptosis pathway is often mutated or inhibited in cancer 
cells for its tumor suppressing function [43]. Dickens et al, 
have identified 170 "cell death" signature genes whose 
homozygous deletions and corresponding loss of expres- 
sion in myeloma are adverse to overall survival [21]. These 
results point to future studies on differential dosage sensi- 
tivity and resistance of pro- and anti-proliferation pathways 
and their variation across patients as therapeutic targets 
and prognosis markers. 

There are several limitations of this study. First, we 
haven't considered indirect effects of CNAs. CNAs could 
directly alter the expression level of transcription factors 
and signaling transduction genes, which in turn affect their 
downstream genes' expression. Partly alleviating this limi- 
tation, our identification of dosage-resistant genes may be 
reflection of indirect effects. Second, parameters and 
thresholds used in the analysis are chosen based on empir- 
ical distributions and consideration for reducing noise, 
such as computing DES only for genes with CNA fre- 
quency >10% and regarding genes as dosage-sensitive at 
DES > 0.4, but choosing other parameters may lead to dif- 
ferent results. We have used DSE > 0.3 to select dosage 
sensitive genes and obtained similar enriched pathways in 
dosage sensitive genes. The agreement between the two 
myeloma datasets with different patient cohorts also sup- 
ports the robustness of the analysis method and parame- 
ters used. These limitations point to directions for further 
improvement of dosage effect analysis. 

Conclusions 

In summary, dosage effect score is a gene-wise meas- 
ure that integrates copy number alteration with gene 
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expression in a quantitative manner. It could be an im- 
portant tool to understanding the functional impact of 
CNAs in tumor development. Our results indicate that 
CNAs in myeloma impact the gene expression level of a 
substantial number of genes. Interestingly, our analysis 
suggests that even though differentially expressed genes 
between HMM and NHMM are enriched in the trisomy 
chromosomes, chromosomal proportion of dosage sensi- 
tive genes is highest for non-trisomy chromosomes. Also, 
higher expression levels associated with stronger dosage 
effect. The dosage sensitive genes are enriched by protein 
translation and localization processes, and dosage resistant 
genes are enriched by genes with function in apoptosis. 
These results support that CNAs exert functional effect in 
cancer transcriptome in part through direct dosage effect 
of CNA-affected genes, and therefore recurrent CNAs can 
be selected during cancer clonal evolution due to their 
dosage effect on specific pathways. The gene lists identi- 
fied in our analysis will help identify functional targets in 
myeloma. The R code is available at http://www.canevolve. 
org/dosageEffect/ for applying this method to other stud- 
ies profiling paired gene expression and copy number can- 
cer samples. 

Availability of supporting data 

All supporting data are included as Additional files 1, 2, 3, 
4 and 5. The data sets supporting the results of this article 
are available in the Gene Expression Omnibus (GEO) re- 
pository, GSE12896, GSE39754 and GSE15695 at http:// 
www.ncbi.nlm.nih.gov/geo/. 
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